home
***
CD-ROM
|
disk
|
FTP
|
other
***
search
/
Internet Info 1993
/
Internet Info CD-ROM (Walnut Creek) (1993).iso
/
inet
/
internet-drafts
/
draft-ietf-iafa-publishing-00.txt
< prev
next >
Wrap
Text File
|
1993-08-17
|
63KB
|
1,756 lines
INTERNET DRAFT P. Deutsch
Request for Comments: XXXX A. Emtage
FYI:XX Bunyip
Expires: March 1, 1994
August 1993
Publishing Information on the Internet with Anonymous FTP
Status of This Memo
This document is an Internet-Draft. Internet-Drafts are working
documents of the Internet Engineering Task Force (IETF), its Areas, and
its Working Groups. Note that other groups may also distribute working
documents as Internet-Drafts. Internet-Drafts are draft documents valid
for a maximum of six months. Internet-Drafts may be updated, replaced,
or obsoleted by other documents at any time. It is not appropriate to
use Internet-Drafts as reference material or to cite them other than as a
"working draft" or "work in progress."
Please send comments to Alan Emtage, bajan@bunyip.com
TABLE OF CONTENTS
ABSTRACT......................................................................1
ACKNOWLEDGEMENTS..............................................................1
PREFACE.......................................................................1
1. ADMINISTRATION.............................................................3
1.1 Scope of this Document.................................................3
1.2 Definitions............................................................3
1.3 Directory Services and Uniform Resource Identifiers....................3
1.3.1 Variant Information...............................................4
1.4 Machine vs. Human Readability..........................................4
2. CONFIGURATION AND CONTENTS INFORMATION.....................................5
2.1 Clusters: Common Data Elements.........................................5
2.1.1 Individuals and Groups............................................5
2.1.2 Organizations.....................................................6
2.1.3 Resource Information..............................................6
2.2 Site-Specific Configuration Information................................7
2.2.1 Configuration Information.........................................7
2.2.2 Logical Archives Configuration....................................8
2.3 Site-Specific Content Information......................................8
2.3.1 Services..........................................................9
2.3.2 Documents, Datasets, Mailing List Archives, Usetnet Archives,
Software Packages, Images and other objects......................10
3. INFORMATION ENCODING FOR SPECIFIC ENVIRONMENTS............................12
3.1 Data Element Structure................................................12
3.2 Variant Fields........................................................13
3.3 Data Formats..........................................................13
3.4 File Naming...........................................................15
3.4.1 Scheme 1: Multi-Record Files.....................................16
3.4.2 Scheme 2: Single-Record Files....................................16
3.5 Encoding..............................................................17
3.6 Common Data Elements..................................................17
3.6.1 Individuals or Groups............................................17
3.6.2 Organizations....................................................18
3.6.3 Miscellaneous....................................................19
3.7 Template Definitions..................................................19
3.7.1 Site Information ................................................19
3.7.2 Logical Archive Information......................................22
3.8 Content Information...................................................24
3.8.1 User Information.................................................24
3.8.2 Organization Information.........................................24
3.8.3 Services Information.............................................24
3.8.4 Documents, Datasets, Mailing List Archives, Usetnet Archives,
Software Packages, Images and other objects......................27
4. CONCLUSION................................................................31
BIBLIOGRAPHY.................................................................31
ABSTRACT
This document specifies a range of information that your site may wish to
make available on your Anonymous FTP Archive to the Internet user
community. Automatic archive indexing tools have been created that can
gather and index this information, thus making it easier for users to
find and access it. It also may be used by the general user community for
extracting information about the archive itself, or about material
contained on the archive.
ACKNOWLEDGEMENTS
This document is the result of work done in the Internet Anonymous FTP
Archives (IAFA) working group of the IETF. Special thanks are due to
George Brett, Jill Foster, Jim Fullton, Joan Gargano, Rebecca Guenther,
John Kunze, Clifford Lynch, Pete Percival, Paul Peters, Cecilia Preston,
Peggy Seiden, Craig Summerhill, Chris Weider and Janet Vratney.
PREFACE
Over the past several years, Anonymous FTP has become the primary method
of publishing information in the Internet environment. Anonymous FTP is
an application-level service that makes use of the File Transfer Protocol
[1], one of the principal protocols of the TCP/IP suite. A well organized
and well maintained Anonymous FTP archive (AFA) can provide a relatively
cheap and simple way to distribute the software, documents, datasets
images and other sources of information that are produced for general
availability on the network today.
Those groups wishing to set up an Anonymous FTP Archive should refer to
"A Guide to Anonymous FTP Site Administration" [2], which provides
details on why you would want to set up such an archive and what steps
are required to have a secure, well-maintained system.
This document specifies a range of information that your site may wish to
make available on the Anonymous FTP Archive to the Internet user
community. Automatic archive indexing tools have been created that can
gather and index this information, thus making it easier for users to
find and access it. It also may be used by the general user community for
extracting information about the archive itself or about material
contained on the archive. Although not required, providing such
information will make your archive a more useful resource.
It is intended that this information be made available through anonymous
FTP archives although the templates described may also be made available
through any other information access mechanism. It is beyond the scope of
this document to provide specific transformations to other mechanisms
since the individual encoding method used will necessarily depend on
several external factors such as operating systems and network protocols
used.
Section 1 of this document contains definitions of the terminology used, as
well as issues related to the use and construction of the information to
be distributed.
In Section 2 we make recommendations that are intended to provide a
standardized means for sharing information about the contents of a
specific archive site such as as services provided by the institution,
document abstracts, software descriptions. In addition administrative
contacts, local Timezone and other site-specific details may be given.
Section 3 contains a set of encoding procedures for the information
outlined in Section 2. These procedures allow the you, the AFA
administrator, to take into account site-specific issues such as whether
your particular operating system offers the capability of creating and
using subdirectories, any limitations on filename length or the inability
to use specific characters in filenames. A generic encoding method is
described and it is expected that conventions to transform this to
specific computing environments will be performed and become generally
known.
Interested parties should also refer to the companion document "Data
Element Templates for Internet Information Objects" [8] for full
definitions of the data templates defined in this document.
1. ADMINISTRATION
1.1 SCOPE OF THIS DOCUMENT
The templates listed below are not intended to comprehensively describe
all possible information that could be provided, but rather to cover
common, useful elements. The determination about what specific
information to provide will have to be made on a case by case basis.
Those individuals or groups completing the information have to
determine how appropriate a particular data element is for their
needs. In many cases data elements such as "home telephone number"
would be not be desirable in databases open for public access.
However, in some cases they may be useful and thus have been included
in this document.
NOTE: Issues of privacy, security and maintainability should all be
considered when determining what information to provide.
This document does not mandate or require that any particular class of
information be offered. However it is hoped that those sites wishing to
offer the information described in this document adhere to the formats
recommended in Section 3.
1.2 DEFINITIONS
For the purposes of this document, the term "data element" is defined
to be a discrete (though not necessarily atomic) piece of information.
For example, a name, telephone number or postal address would all be
considered a "data element". The granularity at which a data element is
defined is determined by the purpose for which it is intended. The term
"field" is interchangeable with "data element".
"Templates" are logical groupings of one or more data elements.
Collectively the templates described in this document will be referred
to as "indexing" or "data" templates.
A "resource" is any network object being described. This could be a
"physical" object like a file, document or printer, or it may be a
"service" such as a weather or Domain Name System server. Any object
which can be referred to as being accessible or addressable on the
network is a resource.
A "record" is an instance of the template with the appropriate fields
filled in for a particular resource.
1.3 DIRECTORY SERVICES AND UNIFORM RESOURCE IDENTIFIERS
Work is currently underway for the construction of what are known as
"Uniform Resource Identifiers" [3]. These will be structured strings
whose purpose is to uniquely identify any resource on the Internet to
determine access and identification information for that resource.
This not only includes documents, software packages etc, but also
images, interactive services and physical resources. This concept has
been integrated into the data templates described below, however no
examples of an actual URI are included.
As this document is written, there are currently no ubiquitous
directory services on the Internet. However, it is likely that in the
relatively near future such a services will be tested and deployed. It
is expected that such a system will provide information for both
network resources (commonly referred to as "Yellow Pages") and
personal data ("White Pages").
The Uniform Resource Identification scheme is needed to address the
issue of identifiers for network resources. The equivalents for
personal data are commonly referred to as "handles" [4]. In this
document it is assumed that these objects are semantically equivalent.
1.3.1 VARIANT INFORMATION
Due to the lack of a universal directory service infrastructure on
the Internet, certain measures need to be taken currently to
provide additional information which such a system would make
available in a rationalized manner. These include the inclusion of
what has come to be known as "variant" information.
It is very difficult in a generic manner to determine equivalency
of the "intellectual content" of a particular resource. For
example, a document may exist both in standard preformatted ASCII
(a "text" file), and PostScript versions. The determination of
whether the two formats contain "equivalent" information must be
left to the person or group indexing (cataloging) such a document.
In order for the user searching the indexing files described in
this document to be able to ultimately locate the desired resource,
information such as location, format, character sets, languages
etc. needs to be included to provide enough context to make an
informed decision.
It is hoped and expected that the methods of information
dissemination described in this document will be superceded by a
more comprehensive system in the relatively near future.
1.4 MACHINE VS. HUMAN READABILITY
At the heart of some data element definitions is their ability to be
parsed and "understood" by computer programs. It is hoped and expected
that much of the information provided in the IAFA templates described
below will be collected and indexed by automated processes without
human intervention. As a result, care has been taken to restrict the
syntax and semantics data element names and some values so as to
facilitate these procedures.
2. CONFIGURATION AND CONTENTS INFORMATION
In this section we define a recommended set of information that you could
make available as the administrator of an archive site. In doing so, you
would extend the functionality of your archive, as well as the
functionality of indexing and resource discovery tools that can pick up
and redistribute such information.
2.1 CLUSTERS: COMMON DATA ELEMENTS
There are certain classes of data elements, such as contact
information, which occur every time an individual, group or
organization needs to be described. Such data as names, telephone
numbers, postal and email addresses etc. fall into this category. To
avoid repeating these common elements explicitly in every template
below, we define such "clusters" here which can then be referred to in
a shorthand manner in the actual template definitions. Predefined
symbols specifying these clusters will then be used in their place
with a prefix which determines to whom or to what this information
applies.
NOTE: A handle should be used in preference to a fully expanded entry
in those situations where a handle for an individual, group or
organization can be obtained and subsequently resolved by some other
(external) method (directory service).
The following clusters have been identified.
2.1.1 INDIVIDUALS AND GROUPS
In order to describe each individual or group in a particular
template, the following common data element subcomponents are
defined. To avoid being repetitive, "individual" in this context
should be read as "individual or group".
- Name of individual
- Name of organization to which individual belongs or under who's
authority this information is being made
- Type of organization to which this individual belongs
(University, commercial organization etc.)
- Work telephone number of individual
- FAX (facsimile) telephone number of individual
- Postal address of individual
- Job title of individual (if appropriate)
- Department to which individual belongs
- Electronic mail address of individual
- Home telephone number of individual
- Home postal address of individual
- Handle
2.1.2 ORGANIZATIONS
The following elements apply when describing organizations and are a
subset of those listed above for individuals and groups. Obviously
some of the elements above (such as home phone number) make no
sense when being applied to an organization. As above, the
following may be subcomponents in a larger, hierarchically
structured data element name.
- Name of organization
- Type of organization to which this individual or group belongs
(University, commerical organization etc.)
- Postal address of organization
- Electronic mail address of organization
- Phone number of organization
- Fax number of organization
- City of organization
- State (province) of organization
- Country of organization
- Handle
2.1.3 RESOURCE INFORMATION
The following is a list of generic data element subcomponents
used when referring to particular resources.
- A complete title for the resource
- Short title
- City of resource
- State (or Province) of resource
- Country of resource
- Description
- Any keywords which might be applied to the resource that would
facilitate users' locating this information
- Type of resource
- Uniform Resource Identifier
- Comment
2.2 SITE-SPECIFIC CONFIGURATION INFORMATION
Information about your archive site itself can often be valuable to
users of your system in order for them to utilize the resource in an
efficient manner.
2.2.1 CONFIGURATION INFORMATION
Site configuration information will help users better understand
your wishes on how and when to access your AFA. This would
include such information as:
Site Information:
- Primary host name of the AFA
- A valid Domain Name System alias (CNAME) for this host [5]
- Individual contact information for site owner(s)
- Individual contact information for site maintainer
(administrators)
- Sponsoring organization contact information
- The geographical (latitude/longitude) location
- The timezone of the site
- Individual contact information for last person last modifying
this record
- The frequency with which the archive site is generally modified
- Times of preferred access for this site
- A summary of the access policies of this site. This should
include such information as preferred times of usage,
conventions or restrictions for uploading files to this site
etc.
- A brief description of the kind of information stored at this
anonymous FTP archive. If the site is intended to specialize in
a particular type of information (examples might include
software for a specific machine type, on-line copies of a
particular type of literature or research papers and
information in a particular branch of science or arts) you
should indicate this.
- Resource information as defined in the resource cluster
2.2.2 LOGICAL ARCHIVES CONFIGURATION
One physical archive site may possibly contain multiple "logical"
archives. For example, a single archive host may be shared amongst
multiple departments, each responsible for the administration of
their own part of the anonymous FTP directory subtree.
Some information (such as a host's location) will remain constant
for the site as a whole. We therefore recommend that you list
Logical Archive specific and site-specific information separately.
Logical Archive configuration:
- Individual contact information for site maintainer
(administrators)
- A valid Domain Name System alias (CNAME) for this host [5] when
referring to this logical archive
- Owning organization contact information
- Sponsoring organization contact information
- Individual contact information for last person last modifying
this record
- A summary of the access policies of this logical archive
- A summary of the type of information that this logical archive may
specialize in
- The frequency with which the archive site is generally modified
- Resource information as defined in the resource cluster
2.3 SITE-SPECIFIC CONTENT INFORMATION
The preceding collections of information make available access and
utilization policies for a site. You could also wish to make available
a selection of information about the actual contents of your archive
or the services available from your organization or institution.
The host system providing the resources need not be the same physical
site on which the descriptive information below is stored. Thus at a
University an AFA maintained by the central campus administration
could advertize services provided by individual departments who might
not have an AFA of their own. Similarly, mailing lists provided on
other administratively related hosts (such as in the same organization)
may have the indexing information available on one host while the
actual mailing list is provided by another machine.
The following categories have been identified.
2.3.1 SERVICES
- The archive can offer an overall description of each the various
Internet services offered by your organization's systems, along
with corresponding contact information.
This description would then indicate whether the the parent
organization offers such services as:
o on-line library catalogues
o Interactive online information services such as WAIS, gopher,
Prospero, World Wide Web or archie
o specialized information servers such as those providing
weather, geographic information, newswire feeds etc.
o Other information services
The following information can be made available:
- Title of service
- Short title of service
- Name of host providing service
- Protocol used by service
- Port number of service
- Required access protocol (telnet, FTP, etc.),
- Contact information for service administration
- A description of the service
- Authentication information (login name, password etc. if
required) or method for authentication (private key etc)
- Description of registration process
- Charging policies for service
- Policies & restrictions on service use
- Access times for service
- Any keywords which might be applied to the record that would
facilitate users' finding this service
- Information on last modification times of this record
- Information on last verification times of this record
- Uniform Resource Identifier
2.3.2 DOCUMENTS, DATASETS, MAILING LIST ARCHIVES, USETNET ARCHIVES,
SOFTWARE PACKAGES, IMAGES AND OTHER OBJECTS
You might wish to make available a brief description of available
software, documents, images, sounds, video, datasets, USENET [6]
archives and mailing list information through the AFA.
Some of the information classes described may not be applicable to
each of the above objects.
This is NOT intended to be an official catalog entry in the sense
used by librarians. It is a simple way to describe documents and
announce their availability. More formal methods may be used
elsewhere to further describe the documents.
- Type of object
- Category (for documents this would be technical report,
conference paper etc)
- Name of object. For example, the name of the mailing list,
software package or title of the document.
- Names and other contact information on the authors
- Names and other contact information for object
maintainer/administrator
- Version designator
- Source of data
- Abstract/description of the object
- Bibliographic entry
- Citation
- Special considerations or restrictions on the object's use (Eg,
in the case of a software package programming
languages/environments needed, hardware restrictions, etc).
- Publication status (For documents: draft, published etc. For
software packages: beta test, production etc.)
- Contact information of publisher
- Copyright and copying policy
- Creation date
- Appropriate keywords for this object
- Discussion forums appropriate for this object (mailing lists,
USENET newsgroups etc.)
- Format of the object (variant)
- Size (variant)
- Language (variant)
- Character set (variant)
- ISBN (variant)
- ISSN (variant)
- Method of access (anonymous FTP etc)
- Last revision date (variant)
- Library Cataloging information
- URI
3. INFORMATION ENCODING FOR SPECIFIC ENVIRONMENTS
In this section we offer a recommended encoding format for each of the
standard items of information suggested in Section 2. In most cases these
recommendations should be applicable to all environments.
We offer such a standardized format so that if such information IS to
be offered, it is formatted in such a way that it can be utilized by
automated indexing and retrieval tools. The encoding methods proposed
were developed to be extensible, so that additional information can be
offered in a similar format, if the site administrator so wishes.
Developing such recommendations offers several challenges. It is
hoped that the encoding conventions should be applicable to as wide a
variety of operating systems, file structures and encoding schemes as
possible. In addition, the globalization of the Internet requires
attention to constraints such as the language in use at an archive site.
In addition, the encoding methods proposed must be easy to implement and,
for the moment, use existing methods of access and retrieval. We
currently assume that the site language is English and the encoding
ASCII, but it is expected that additional formats for other languages and
encoding schemes will be developed over time.
3.1 DATA ELEMENT STRUCTURE
All data elements have been defined as "attribute/value" pairs which can
be generically described as:
<data element name>: <data element value>
where <data element name> would for example be "Work-Phone" and the
<data element value> would be "+1 (514) 555 1212" (note that the double
quotes (") are not part of the strings, but serve here to delimit the
example).
The term "field name" is interchangeable with "data element name". The
term "field value" is interchangable with "data element value".
It is intended that wherever possible and necessary, a well-defined
hierarchical structure will be used when defining data element names.
This allows them to be generally and logically extensible.
All data element names may contain only alphanumeric characters, the
hyphen ("-") and hash (number sign, pound sign "#"). No embedded
spaces are allowed. All data element names are case insensitive
although here initial letters are capitalized for readability.
Some data elements may be for internal use to the site administrator.
These field names must start with the hash character "#". All other
rules for continuation (section 3.4.1) remain the same. Such fields
should be ignored by software indexing or otherwise
Data element names without associated field values are legal in
templates.
3.2 VARIANT FIELDS
In section 1.3.1 we describe some information as being "variant" in
that network objects may vary in "format" but are judged to have the
same "intellectual content". In the following data element definitions
we use the technique of allowing a sequence number to be appended to a
set of data elements to describe a particular variant.
For example, we have a document "War and Peace" which exists in ASCII
text, PostScript and NROFF format. The PostScript version also exists
in two natural languages, English and Russian. We define here 3 data
elements: "Filename", "Language" and "Format". In addition to the
other information stored in the indexing record for "War and Peace"
which we consider to remain constant across all variants, (like the
name of the author), we can add the following data elements:
Format-v0: PostScript
Language-v0: English
Filename-v0: war-and-peace.english.ps
Format-v1: PostScript
Language-v1: Russian
Filename-v1: war-and-peace.russian.ps
Format-v2: ASCII
Language-v2: English
Filename-v2: war-and-peace.english.txt
Format-v3: nroff
Language-v3: English
Filename-v3: war-and-peace.english.nroff
The "-v<number>" syntax allows one to repeat a set of data elements
for a particular variant and tie them all together with a common
sequence <number> so that individual instances of the particular
resource with the desired characteristics may be located.
<number> is an arbitrary number with the only restriction that all
records with that particular sequence value are logically connected
in a similar manner to that illustrated above.
The variant number need not exist when variants are not being
described and the "-v<number>" syntax may be omitted in those cases.
In the data element definitions below, the syntax "-v*" will be used
to identify those elements for which variants are allowed.
3.3 DATA FORMATS
To facilitate the machine readability of certain data elements, the
following syntax applies.
1) All electronic mail (Email) addresses must be as defined in RFC
822, Section 6. Names and comments may be included in the Email address.
For example:
"John Doe" <jd@ftp.bar.org>
or
jd@ftp.bar.org
are valid Email addresses.
2) All hostnames are to be given as Fully Qualified Domain Names as
defined in RFC 1034, Section 3.
For example "foo.bar.com"
3) All host IP addresses are given in "dotted-quad" (or
"dotted-decimal") notation.
For example, "127.0.0.1"
4) All numeric values are in decimal unless otherwise stated.
5) Dates/times must be given as defined in RFC 822, Section 5.1 and
modified in RFC 1123, Section 5.2.14 [7]:
date-time = [ day "," ] date time ; dd mm yy
; hh:mm:ss zzz
day = "Mon" / "Tue" / "Wed" / "Thu"
/ "Fri" / "Sat" / "Sun"
date = date = 1*2DIGIT month 2*4DIGIT ; day month year
; e.g. 20 Jun 82
month = "Jan" / "Feb" / "Mar" / "Apr"
/ "May" / "Jun" / "Jul" / "Aug"
/ "Sep" / "Oct" / "Nov" / "Dec"
time = hour zone ; ANSI
hour = 2DIGIT ":" 2DIGIT [":" 2DIGIT]
; 00:00:00 - 23:59:59
zone = "UT" / "GMT" ; Universal Time
; North American : UT
/ "EST" / "EDT" ; Eastern: - 5/ - 4
/ "CST" / "CDT" ; Central: - 6/ - 5
/ "MST" / "MDT" ; Mountain: - 7/ - 6
/ "PST" / "PDT" ; Pacific: - 8/ - 7
;
;
/ ( ("+" / "-") 4DIGIT ) ; Local differential
; hours+min. (HHMM)
For example the string "Sat, 18 Jun 93 12:36:47 -0500" is a valid
date.
While the string "12:36:47 GMT" is a valid time. Quoting from RFC
1123, Section 5.2.14:
There is a strong trend towards the use of numeric timezone
indicators, and implementations SHOULD use numeric timezones
instead of timezone names. However, all implementations MUST
accept either notation. If timezone names are used, they MUST
be exactly as defined in RFC-822.
6) Time ranges (or periods) must be specified as pairs of time values
(as defined above in note (5)), separated by a "/".
Thus
12:00 GMT / 05:45 GMT
is a valid time range. Multiple time ranges are separated by
whitespace.
7) "whitespace" is defined as one or more blank (octal 40) and/or tab
(octal 11) ASCII characters.
8) References to "UT" mean Universal Time (also known as Greewhich Mean
Time or "GMT").
9) All telephone numbers are to be given as a minimum in full with
country and routing codes without separators. The number should be
given assuming someone calling internationally. The number given in
the local convention may optionally be specified.
For example,
Telephone: 1 514 875 8189 (+1-514-875-8611)
or
Telephone: 44 71 732 8011
3.4 FILE NAMING
For the greatest flexibility, it is assumed that unless otherwise
stated each file containing the indexing information may reside
anywhere in the anonymous FTP subtree and in addition, any number of
these files may exist. The intention here is that they may be placed
in the same location as the information they are indexing. You, as the
administrator are free to place these files wherever you think
appropriate in most cases. However, some files may carry information
from their place in the directory structure and therefore they may not
just be randomly placed in the archive.
One of two naming schemes may be used to provide maximum flexibility to
the archive administrator to allow use on as varied a set of hardware
and software platforms as possible. In the first case, full filenames
are used to identify indexing files. In the second, use is made of a
filename extension. It is possible to use both methods on one archive
concurrently; however care should be taken that information is not
duplicated in files named under the two conventions. The use of one
naming scheme in a consistent manner is strongly recommended.
It is expected that those administrators wishing to have one large
file containing all templates relevant to their AFA will use Scheme 1,
whereas those who prefer associating templates with individual
packages or documents in the file system will use scheme 2. For this
reason, scheme 2 does not allow multi-record entries in the same file.
3.4.1 SCHEME 1: MULTI-RECORD FILES
A file of the given name will exist for each category listed in
Section 2. For the sake of consistency across operating systems and
for the ability to distinguish them from non-configuration files of
the same name, the filenames will be in all uppercase letters.
In order for tools to easily identify an indexing file from the
other data files at the archive site, all indexing filenames must
begin with the four character string "AFA-".
(1) Files that may contain multiple instances of a given category
(a set of mailing lists, for example) will be divided into
records and each record containing multiple fields. Unless
otherwise specified, files may contain one or more records as
defined below and multiple records are separated by one or more
blank lines (lines which contain zero or more whitespace
characters and the NEWLINE character). The start of each field
is marked by a special fieldname on a new line in the leftmost
(first) column followed by a colon (:). All data element names
must start in the first column.
(2) Field data must be separated from fieldname by whitespace. Any
field may continue on the next line by whitespace in the first
column of that line. Multi-line fields are delimited by the
first line which does not have whitespace in the first column.
(3) Fields in the same record must not contain any blank lines
between them.
3.4.2 SCHEME 2: SINGLE-RECORD FILES
An alternate method for naming indexing files is the use of the
filename extension ".AFA". Note that this is all capital letters.
Filenames in this context are defined as having 2 parts, a
"basename" and an "extension", which are combined to form the full
filename in the following manner:
<basename>.<extension>
(1) The basename under this scheme can be any arbitrary string and
is determined by the administrator. Any user or indexing tool
retrieving this indexing file needs to know what kind of record
any file contains, and so a special field (Template-Type) is
used to specify this information.
(2) No indexing file under this scheme may contain more than one
record (as defined above).
(3) Field data must be separated from fieldname by whitespace. Any
field may be continued on the next line by whitespace in the
first column of that line. Multi-line fields are delimited by
the first line which does not have whitespace in the first
column.
(4) Fields must not contain any blank lines between them.
3.5 ENCODING
Indexing files should be made world readable. It is assumed that size
and modification times can be obtained through the existing FTP
mechanism and are operating system specific.
The advantages to this system are that this information need only be
constructed once with infrequent periodic updates as changes occur.
Several of these files may never change during the lifetime of the host
as an anonymous FTP site. They require no special programs or
protocols to construct: a text editor is all that is needed.
The filename for those sites using the first naming scheme is given at
the top of the definition. This is not part of the definition.
NOTE: In the definitions below, the fields are separated by blank lines
ONLY to improve readability, these lines must not occur in an
actual record (see example below).
Below are listed the suggested templates for each type of information
described in part 2.
3.6 COMMON DATA ELEMENTS
As described in Section 2, there are number of data elements which are
often needed and which form a natural grouping for certain kinds of
information ("clusters"). Below we define the data element names and
semantics of these clusters.
These clusters are intended to provide the lowest level in the
hierarchical structure of data element names. For example, contact
information for the authors of a document would be preceded by the
string "Author-" thus forming data elements of "Author-Name",
"Author-Postal", "Author-Fax", etc.
3.6.1 INDIVIDUALS OR GROUPS
Data Element Name Description
Name Name of individual
Organization-Name Name of organization to which individual
belongs or under whose authority
this information is being made
Organization-Type Type of organization to which this
individual belongs (University,
commercial organization etc.)
Work-Phone Work telephone number of individual
Work-Fax FAX (facsimile) telephone number of
individual
Work-Postal Postal address of individual
Job-Title Job title of individual (if appropriate)
Department Department to which individual belongs
Email Electronic mail address of individual
Handle Unique identifier for this record
Home-Phone Home telephone number of individual
Home-Postal Home postal address of individual
Home-Fax FAX (facsimile) telephone number of
individual
This cluster will be referred to as "USER*" in the template
definitions below.
3.6.2 ORGANIZATIONS
The following elements apply when describing organizations and are a
subset of those listed above for individuals and groups. Obviously
some of the elements above (such as home phone number) make no
sense when being applied to an organization. As above, the
following may be subcomponents in a larger, hierarchically
structured data element name.
Data Element Name Description
Organization-Name Name of organization
Organization-Type Type of organization to which this
individual or group belongs (University,
commercial organization etc.)
Organization-Postal Postal address of organization
Organization-City City of organization
Organization-State State (province) of organization
Organization-Country Country of organization
Organization-Email Electronic mail address of organization
Organization-Phone Phone number of organization
Organization-Fax Fax number of organization
Organization-Handle Handle of organization
This cluster will be referred to as "ORGANIZATION*" in the
template definitions below.
3.6.3 MISCELLANEOUS
The following is a list of generic data element subcomponents used
when referring to particular resources.
Data Element Name Description
Title A complete title for the resource
Short-Title Summary title
City City of resource
State State (Province, etc) of resource
Country Country of resource
Description Description of resource
Keywords Any keywords which might be applied to the record
that would facilitate users' finding this
resource.
URI Uniform Resource Identifier
3.7 TEMPLATE DEFINITIONS
3.7.1 SITE INFORMATION
This file contains one (1) record with the following fields.
IMPORTANT: There should only be one instance of this file in each
archive.
Filename for this index file (naming scheme 1): AFA-SITEINFO
Fields for this file.
Template-Name: SITEINFO
Host-Name: Primary Domain Name System host name
Alias: Preferred DNS-registered name for the
AFA host. This name must be valid CNAME
entry in the Domain Name System.
Admin-(USER*):
Contact information of the individual or
group responsible for administering this
site.
Owner-(ORGANIZATION*):
Contact information for the organization
owning this site.
Sponsoring-(ORGANIZATION*):
Contact information for the organization
sponsoring this site.
City: City of the host
State: State (province) of the host
Country: Country of the host
Latitude-Longitude:
Latitude and longitude of site (See Note <1>)
Timezone: Timezone as define in section 3.3 above.
Record-Last-Modified-(USER*):
Contact information for individual who
last modified this record
Record-Last-Modified-Date:
The date this record was last modified
Record-Last-Verified-(USER*):
Contact information of person or group
last verifying that this record was
accurate
Record-Last-Verified-Date:
The date the last time this record was
verified
Update-Frequency:
Preferred frequency of retrieval of all
AFA extended configuration information by
automated retrieval tools (See Note <2>)
Access-Times: Time ranges (as defined in Section 3.3) of access
to anonymous FTP users
Access-Policy: Information such as conventions or
restrictions for uploading files to this
site etc.
Description: This file contains text describing any areas of
specialization for this site. For example, if
the site contains information related to the
field of molecular biology a paragraph or two with
the keywords "molecular biology" and some further
description would be in order. It should also
mention if this site contains "logical" archives.
Keywords: Appropriate keywords describing contents
of this AFA
Notes for this template.
<1> Latitude and longitude are specified in that order as
CDD.MM.SS/CDD.MM.SS
Where
DD is in degrees
MM is in minutes
SS is in seconds
C is the direction designator which is
For latitude
"+" is north of the equator
"-" is south of the equator
For longitude
"+" is west of the Greenwich meridian
"-" is east of the Greenwich meridian
The double quotes (") are not part of the designator, but are
used here to delimit the symbols.
<2> The period is measured in days. This value should be chosen to
reflect the turnover of information at the archive.
An example of a SITEINFO record:
Template-Type: SITEINFO
Name: foo.bar.org
Preferred-Name: ftp.bar.org
Admin-Name: John Doe
Admin-Work-Postal: PO Box. 6977, Marinetown, PA 17602
Admin-Work-Phone: +1 717 555 1212
Admin-Work-Fax: +1 717 555 1213
Admin-Email: FTP@bar.org
Owner-Organization-Name: Beyond All Recognition Foundation
City: Lampeter
State: Pennsylvania
Country: USA
Latitude-Longitude: -37.24.43/+121.58.54
Timezone: -0400
Record-Last-Modified-Name: John X. Doe
Record-Last-Modified-Email: johnd@bar.org
Record-Last-Modified-Date: Mon, Feb 10 92 22:43:31 EST
Update-Frequency: 10
Access-Times: 02:00 GMT / 08:00 GMT 18:00 GMT / 21:00 GMT
Access-Policy: Non-proprietary data may be uploaded to
this site in the "incoming" directory.
Please contact site administrators if you
do so. Proprietary material found in this
directory will be removed. This site is
not to be used as a temporary storage
area.
Description: This site contains data relating to DNA
sequencing particularly Yeast chromosome
1. Datasets are available. There is also
a selection of programs available for
manipulating this information.
Keywords: DNA, sequencing, yeast, genome, chromosome
3.7.2 LOGICAL ARCHIVE INFORMATION
Filename for this index file (naming scheme 1): AFA-LARCHIVE
IMPORTANT: The placement of this file in the file structure is
significant: It implies that the directory in which this file
exists and all subdirectories are part of the logical archive.
Any number of these files may exist in the archive, but only one per
directory.
Template-Type: LARCHIVE
Admin-(USER*):
Contact information of the individual or
group responsible for administering this site.
Alias: Preferred DNS-registered name for the AFA host as
this logical archive. This name must be
valid CNAME entry in the Domain Name System.
Owner-(ORGANIZATION*):
Contact information for the organization owning
this site.
Sponsoring-(ORGANIZATION*):
Contact information for the organization
sponsoring this site.
Record-Last-Modified-(USER*):
Contact information for individual who last
modified this record
Record-Last-Verified-(USER*):
Contact information of person or group
last verifying that this record was
accurate
Record-Last-Verified-Date:
The date the last time this record was
verified
Access-Policy: Information such as conventions or
restrictions for uploading files to this
logical archive.
Description: Contains text describing any area of
specialization for the logical archive
Update-Frequency:
Preferred frequency of retrieval of all AFA
extended configuration information by
automated retrieval tools (See Note <1>)
Keywords: Appropriate keywords describing contents of this
logical AFA
Notes for this record.
<1> The period is measured in days. This value should be chosen to
reflect how often information at the archive changes.
An example of a LARCHIVE record:
Template-Type: LARCHIVE
Owner-Organization: Orymonix Incorporated
Organization-Type: Commercial
Preferred-Name: oxymoron-x.com.uk
Access-Policy: This archive is open to general access
Description: This archive contains essays on Military
Intelligence, Postal Service and
Progressive Conservatism. All material
contained in this archive is in the
public domain
Admin-Name: Ima Admin
Admin-Email: imaa@oxymoron.com.uk
Admin-Work-Phone: +44 71 123 4567
Admin-Work-Fax: +44 71 123 5678
Admin-Postal: 555 Marsden Road, London, SE15 4EE
Record-Last-Modified: Yuri Tolstoy <yt@snafu.com.uk>
Record-Last-Modified-Date: Mon, Jun 21 93 17:03:23 EDT
Update-Frequency: 20
Keywords: Militarism, Post Office, Conservatism
3.8 CONTENT INFORMATION
For the following categories the assumption should not be made that
the information applies to the anonymous FTP host itself. Rather,
the group or organization may publish general information: the
specific information will be contained inside the file describing
the category.
3.8.1 USER INFORMATION
To allow for the use of "handles" and so as not to require the
repitition of the USER* information each time this cluster is
needed in other templates we define here a USER template in which
the information can be stored in one place. Assuming the use of a
unique handle, other records may then refer to this template to
complete the require information. The definition is simply the data
elements listed in 3.6.1 above.
Filename for this index file (naming scheme 1): AFA-USER.
The Template-Type is USER (naming scheme 2).
3.8.2 ORGANIZATION INFORMATION
In a similar manner to the USER template, the ORGANIZATION template
provides common information which may be used in other (larger)
templates to yielding a central source of information.
Filename for this index file (naming scheme 1): AFA-ORGANIZATION.
The Template-Type is ORGANIZATION (naming scheme 2).
3.8.3 SERVICES INFORMATION
This file contains records with the following fields. In the first
naming scheme each record is started and delimited by the
"Template-Type" field.
Filename for this index file (Naming scheme 1): AFA-SERVICES
Any number of these files may exist in the archive.
Template-Type: SERVICES
Name: Name of service
Host-Name: Host name of host providing service
Host-Port: Port number of service
Protocol: Method required to access service (See
Note <1>)
Admin-(USER*):
Contact information of person or group
responsible for service administration
(administrative contact)
Admin-(ORGANIZATION*):
Information on organization responsible
for this service
Sponsoring-(ORGANIZATION*):
Contact information for the
organization sponsoring this site.
Description: Free text description of service
Authentication: Authentication information. Free text
field supplying login and password
information (if necessary) or other
method for authentication
Registration: How to register for this service if
general access is not available
Charging-Policy:
Free text field describing any changing
mechanism in place. Additionally, fee structure
may be included in this field.
Access-Policy: Policies and restrictions for using this
service
Access-Times: Time ranges for mandatory or preferred access of
service.
Keywords: Keywords appropriate for describing this
service
Record-Last-Modified-(USER*):
Contact information of person or group
last modifying this record
Record-Last-Modified-Date:
The date the last time this record was
modified
Record-Last-Verified-(USER*):
Contact information of person or group
last verifying that this record was
accurate
Record-Last-Verified-Date:
The date the last time this record was
verified
Notes on this file.
<1> The Internet protocol used to communicate with this service.
For example, "telnet" or "SMTP" or "NNTP" etc. A more complete
explanation of specialized protocols (which may not be
generally known) should be supplied in the main description.
Example 1
---------
The follwing is an example of an entry for a telnet service.
Template-Type: SERVICES
Name: Census Bureau information server
Host-Name: census.ispy.gov
Host-Port: 1234
Protocol: telnet
Admin-Name: Jay Bond
Admin-Postal: PO Box. 42, A Street Washington DC, USA 20001
Admin-Work-Phone: +1-202-222-3333
Admin-Work-Fax: +1 202 444 5555
Admin-Email: jb007@census.ispy.gov
Description: This server provides information from the
latest USA Census Bureau statistics (1990)
Type "help" for more information.
Authentication: Once connected type your email address at
the "login:" prompt. No password is
required.
Registration: No formal registration is required
Charging-Policy: There is no charge for the use of this service
Access-Times: 9:00 EST / 17:00 EST
Access-Policy: This service may not be used by sites in
the Republic of the VTTS
Keywords: census, population, 1990, statistics
Last-Modified-Name: Miss Moneypenny
Last-Modified-Email: m.moneypenny@census.ispy.gov
Last-Modified-Date: Wed, 1 Jan 1970 12:00:00 GMT
Example 2
---------
The following is an example of a mailing list (service).
Template-Type: SERVICES
Name: fishlovers
Host-Name: foo.com
Admin-Name: Ima Adams
Admin-Email: fishlovers-request@foo.com
Protocol: Email to fishlovers@foo.com
Registration: Send mail to the administrative address with your
own email address requesting addition
Description: Discussion list for people who love fish of all types
Address: iafa@cc.mcgill.ca
Keywords: fish, aquarium, marine, freshwater, saltwater
Access-Policy: Any Internet user may subscribe to this mailing list
3.8.4 DOCUMENTS, DATASETS, MAILING LIST ARCHIVES, USENET ARCHIVES,
SOFTWARE PACKAGES, IMAGES AND OTHER OBJECTS
This file contains records with the following fields.
For multi-record files each record is started and the previous
record is delimited by the "Template-Type" field which also
determines the type of object being indexed. Suggestions for these
types include:
Type of Object Template-Type
-------------- --------------
Document DOCUMENT
Image IMAGE
Software Package SOFTWARE
Mailing List Archive MAILARCHIVE
Usenet Archive USENET
Sound File SOUND
Video File VIDEO
Frequently Asked Questions File FAQ
Other names may be constructed as necessary.
Any number of these files may exist in the archive.
Filename for this index file (naming scheme 1): AFA-OBJECT
Template-Type: See above list
Category: Type of object. See Note <1>
Title: Title of the object
Author-(USER*): Description/contact information about the
authors/creators of the object. These
fields may be repeated as often as is
necessary.
Admin-(USER*): Description/contact information about the
administrators/maintainers of the object.
These fields may be repeated as often as
is necessary.
Record-Last-Modified-(USER*):
Contact information about person last
modifying this record
Record-Last-Modified-Date:
The date the last time this record was
modified
Record-Last-Verified-(USER*):
Contact information of person or group
last verifying that this record was
accurate
Record-Last-Verified-Date:
The date the last time this record was
verified
Version: A version designator for the object
Source: Information as to the source of the
object.
Requirements: Any requirements for the use of the
object. A free text description of any
hardware/software requirements necessary
to use the object
Description: Description (that is, "abstract" in the
case of documents) of the object.
Bibliography: A bibliographic entry for the object
Citation: The citation for the object when used
in other works
Publication-Status: Current publication status of object
(draft, published etc).
Publisher-(ORGANIZATION*):
Description/contact information about
object publisher
Copyright: The copyright statement. Any additional
information on the copying policy may be
included
Creation-Date: The creation date for the object
Discussion: Free text description of possible
discussion forums (USENET groups, mailing
lists) appropriate for this object
Keywords: Appropriate keywords for this object
Format-v*: Formats in which the object is
available (See Note <2>).
Size-v*: Length of object in bytes (octets).
Language-v*: The name of the language in which the
object is written. For documents this
would be the natural language. For
software this would be the programming
language
Character-Set-v*: The character set of the object. This
should be a well-known value for example
"ASCII" or "ISO Latin-1".
ISBN-v*: The International Standard Book Number of
the object
ISSN-v*: The International Standard Serial Number
of the object
Access-Protocol-v*: Method of access to this object (eg,
anonymous FTP, Gopher etc.) as well as
the host on which it resides. Also any
additional information needed to access
the object.
Access-Host-Name-v*: Host on which to access the object
Access-Host-Port-v*: Port on which to access the object. This
may be impled by the Access-Protocol and
so not be necessary
Pathname-v*: The full pathname of this object. This is
operating system specific. This is not
required if naming scheme 2 is used
Last-Revision-Date-v*: Last date that the object was revised
Library-Catalog-v*: Library cataloging information (See
Note <3>).
Notes for this template.
<1> The intention of this field is to define the category of the
object. For example, in the case of documents it could be
"Technical Report", or "Conference Paper" and the name and date
of the conference at which the paper was presented. It may also
be something like "General Guide" or "User manual".
<2> Objects are often available in several formats. Examples for
documents include "PostScript", "ASCII text", "DVI" etc. For
images this may be "gif", "jpeg", "miff" etc.
<3> Library cataloging numbers. In those cases where the number
itself does not contain enough information to determine the
cataloging scheme, the name of the scheme should be included.
Example 1
---------
Example of AFA-OBJECT file for the DOCUMENT type. Note that this
example contains variant information.
Template-Type: DOCUMENT
Title: The Function of Homeoboxes in Yeast
Chromosome 1
Access-Method: These files are available from
ftp.fungus.newu.edu via anonymous FTP.
The are stored in the directory
pub/yeast/chromosome1
Author-Name: John Doe
Author-Email: jdoe@yeast.foobar.com
Author-Home-Phone: +1 898 555 1212;
Author-Name: Jane Buck
Author-Email: jane@fungus.newu.edu
Last-Revision-Date: 27 November 1991
Category: Conference paper. Yeastcon, January 1992,
Mushroom Rock, CA, USA
Abstract: Homeoboxes have been shown to have a
significant impact on the expressions of
genes in Chromosome 1 of Bakers' Yeast.
Citation: J. Doe, J. Buck, The function of
homeoboxes in Yeast Chromosome 1, Conf.
proc. Yeastcon, January 1992, Mushroom
Rock, pp. 33-50
Publication-Status: Published
Publisher-Organization-Name: Yeast-Hall
Publisher-Organization-Postal: 1212 5th Avenue NY, NY, 12001
Copyright: The copyright on this document is held by
the authors. It may be freely copied and
quoted as long as the contribution of the
authors is acknowledged
Library-Catalog: LCC 1701D
Keywords: homeobox, yeast, chromosome, DNA,
sequencing, yeastcon
Format-v0: PostScript
Pathname-v0: yeast-homeobox1.ps
Language-v0: English
Size-v0: 18 pages
Format-v1: ASCII (without graphs)
Pathname-v1: yeast-homeobox1.txt
Size-v1: 13 pages
Language-v1: Russian
Example 2
---------
This is an example of a software entry. Note the use of the
software maintainer's "handle" instead of the explicit contact
information. This could be used if there was a well-known external
method of resolving this handle.
Template-Type: SOFTWARE
Title: Beethoven's Fifth Player
Version: 67
Author-Name: Ludwig Van Beethoven
Author-Email: beet@romantic.power.org
Author-Fax: +43 1 123 4567
Admin-Handle: berlioz01
Abstract: The program provides the novice to Transitional
Classical-Romantic music a V-window interface
to the author's latest composition
Abstract: V-window based music player
Requirements: Requires the V-Window system version 10 or higher
Discussion: USENET rec.music.classical
Copyright: Freely redistributable for non-commercial use.
Copyright held by author
Keywords: Classical music, V-windows
Format-v0: LZ compressed
Pathname-v0: /pub/Vfifth.tar.Z
Access-Method-v0: Anonymous FTP
Access-Host-Name-v0: power.org
4. CONCLUSION
This document attempts to provide the foundation for a common set of
recommended cataloging practices which may be used on the Internet to
enhance the utility of Anonymous FTP archives, currently the most widely
used and supported mechanism for general information storage and
retrieval. It is intended that these recommendations be flexible enough
to accommodate a broad spectrum of information classes and it is hoped
that they will be widely used and that automated tools will be developed
to use the valuable information that they make available.
----------------------------------------------------------------------
Bibliography
------------
[1] RFC 959 Postel, J.B.; Reynolds, J.K. File Transfer Protocol. 1985
October
[2] "A Guide to Anonymous FTP Site Administration". Work in progress from
the Internet Anonymous FTP Archive Working Group of the IETF.
[3] Internet Draft "draft-ietf-uri-resource-names-00.txt" Work in
Progress from the Uniform Resource Identifier Working Group of the
IETF.
[4] RFC 954 Harrenstien, K.; Stahl, M.K.; Feinler, E.J. NICNAME/WHOIS.
1985 October
[5] RFC 1034 Mockapetris, P.V. Domain names - concepts and facilities.
1987 November
[6] RFC 1036 Horton, M.R.; Adams, R. Standard for interchange of USENET
messages. 1987 December
[7] RFC 1123 Braden, R.T.,ed. Requirements for Internet hosts -
application and support. 1989 October
[8] Internet Draft "Data Element Templates for Internet Information
Objects". Work in progress from the Internet Anonymous FTP Archive
Working Group of the IETF.